In this project, we shall be conducting a descriptive analysis of drought disasters in Africa, to understand the magnitude and impact of drought across various dimensions.
The EM-DAT database records mass disasters as well as their health and economic impacts at a country level. The database contains core data on the occurrence and effects of 26,000 disasters worldwide from 1900 to the present, and is managed and distributed by the Centre for Research on the Epidemiology of Disasters (CRED). The database is compiled from various sources of information, including UN agencies, non-governmental organizations, insurance companies, research institutes, and press agencies. The dataset used for this project was filtered specifically for drought disasters in Africa, and consists of 168 observations of drought disasters between 1999 and 2022. A detailed documentation of the data and glossary of each feature can be found on the EM-DAT website
This descriptive analysis aims to uncover the following questions about drought as a natural disaster in Africa:
We shall use this cell to import necessary libraries and packages that we need to analyse this data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px
%matplotlib inline
# If we do not currently have a library or package installed on our current device, we can quickly install them using pip as shown below:
# UNCOMMENT NEXT LINE TO INSTALL PLOTLY LIBRARY. This requires an internet connection.
# You don't have to run it next time once you have the library installed. Please kindly wait for the install to complete.
# !pip install plotly
Now, let's load the data which we cleaned earleir on and saved as "drought_data_cleaned.csv". We would also want to ensure that the data are in the right types as we load them to this notebook
df = pd.read_csv('drought_data_cleaned.csv',
dtype = {
"Total Affected": 'int64',
"Country": 'category',
"Subregion": 'category',
"Associated Types": 'category',
"OFDA Response": 'category',
"Appeal": 'category',
"Declaration": 'category',
"Start Year": 'category',
"Start Month": 'category',
"End Year": 'category',
"End Month": 'category',
"UN Sub Region": 'category',
"Income group": 'category',
})
Let's print out a few lines of the table to see if it was correctly loaded. In this case, we shall print out only the first five lines.
df.head(n=5)
| Total Affected | Country | Subregion | ISO | Origin | Associated Types | OFDA Response | Appeal | Declaration | Start Year | Start Month | End Year | End Month | UN Sub Region | Income group | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 100000 | Djibouti | Sub-Saharan Africa | DJI | Not specified | Not specified | Yes | No | No | 2001 | 6.0 | 2001 | 0.0 | Eastern Africa | Middle Income |
| 1 | 2000000 | Sudan | Northern Africa | SDN | Not specified | Food shortage|Water shortage | No | No | No | 2000 | 1.0 | 2001 | 0.0 | Northern Africa | Middle Income |
| 2 | 1200000 | Somalia | Sub-Saharan Africa | SOM | Not specified | Food shortage | No | No | No | 2000 | 1.0 | 2001 | 0.0 | Eastern Africa | Low Income |
| 3 | 231290 | Madagascar | Sub-Saharan Africa | MDG | Not specified | Not specified | No | No | No | 2000 | 6.0 | 2000 | 0.0 | Eastern Africa | Low Income |
| 4 | 0 | Burkina Faso | Sub-Saharan Africa | BFA | Not specified | Not specified | No | No | No | 2001 | 4.0 | 2001 | 0.0 | Western Africa | Low Income |
The code cell below will provide additional information about the data, such as the number of columns, how many entries are missing in each column, and the data type of the column.
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 168 entries, 0 to 167 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Total Affected 168 non-null int64 1 Country 168 non-null category 2 Subregion 168 non-null category 3 ISO 168 non-null object 4 Origin 168 non-null object 5 Associated Types 168 non-null category 6 OFDA Response 168 non-null category 7 Appeal 168 non-null category 8 Declaration 168 non-null category 9 Start Year 168 non-null category 10 Start Month 168 non-null category 11 End Year 168 non-null category 12 End Month 168 non-null category 13 UN Sub Region 168 non-null category 14 Income group 168 non-null category dtypes: category(12), int64(1), object(2) memory usage: 11.1+ KB
Now that we've trimmed and cleaned our data, we're ready to move on to exploration. It's now time to compute statistics and create visualizations with the goal of addressing the research questions that we posed in the Introduction section. We shall be systematic with our approach, by looking at one variable at a time, and then following it up by looking at relationships between variables.
Distribution per region
The code below looks at the "UN Sub Region" column of our data, to tell us how many unique entries are represented. In this case, there are 5 Sub Regions represented, following the UN regional classification.
df['UN Sub Region'].unique()
['Eastern Africa', 'Northern Africa', 'Western Africa', 'Middle Africa', 'Southern Africa'] Categories (5, object): ['Eastern Africa', 'Middle Africa', 'Northern Africa', 'Southern Africa', 'Western Africa']
Now, let's find out how many persons were affected in each subregion for the period under review. Note that the first line of code was separated by the "=" sign. To the left of the "=" sign is the variable, which is like a container that stores the value of our operation. Then, to the right is the actual Python function that performs the operation. The name of the variables have been made descriptive enough to give an idea of the content we want it to store for us.
count_by_subregion = df.groupby("UN Sub Region").sum().reset_index()
count_by_subregion
C:\Users\DELL\AppData\Local\Temp\ipykernel_8556\158047414.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
count_by_subregion = df.groupby("UN Sub Region").sum().reset_index()
| UN Sub Region | Total Affected | |
|---|---|---|
| 0 | Eastern Africa | 227248956 |
| 1 | Middle Africa | 44340003 |
| 2 | Northern Africa | 17839300 |
| 3 | Southern Africa | 38461515 |
| 4 | Western Africa | 86394334 |
The cell above shows that Eastern Africa was the worst hit in terms of the number of persons affected by drought. Let's see how they compage with other regions. We shall first compute the percentage, then plot a simple pie chart.
# compute percentages for each subregion
count_by_subregion['Percentage'] = (count_by_subregion['Total Affected'] / sum(count_by_subregion['Total Affected'])) * 100
count_by_subregion
The code cell below plots a donut chart for us using the plotly library:
template = "ggplot2"
count_by_subregion_donut = px.pie(data_frame=count_by_subregion,
values='Total Affected',
width = 600,
height = 600,
names = "UN Sub Region",
title="Number of persons affected per UN Subregion",
hole= 0.7,
template = template,
)
count_by_subregion_donut.update_layout(
legend=dict(x=1, y=0))
OBSERVATION: The visual above shows that 54.9% of person affected come from eastern africa. This means that one in every two persons affected by drought in africa is from eastern africa.
Distribution by Country
Now, going more specifically, let's see which countries were most hit:
count_by_country = df.groupby("Country").sum().sort_values(by='Total Affected', ascending=False).reset_index()
count_by_country["Percentage of Total Affected"] = count_by_country["Total Affected"]/ count_by_country["Total Affected"].sum() * 100
count_by_country
C:\Users\DELL\AppData\Local\Temp\ipykernel_8556\1282752396.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
| Country | Total Affected | Percentage of Total Affected | |
|---|---|---|---|
| 0 | Ethiopia | 74705679 | 18.032475 |
| 1 | South Africa | 30450000 | 7.350028 |
| 2 | Kenya | 29750000 | 7.181062 |
| 3 | Niger | 29303986 | 7.073403 |
| 4 | Somalia | 27535624 | 6.646556 |
| 5 | Democratic Republic of the Congo | 25972806 | 6.269322 |
| 6 | Zimbabwe | 21135118 | 5.101600 |
| 7 | Malawi | 19727628 | 4.761860 |
| 8 | Nigeria | 19110398 | 4.612873 |
| 9 | Sudan | 17839300 | 4.306055 |
| 10 | South Sudan | 15623670 | 3.771245 |
| 11 | Mali | 13660753 | 3.297436 |
| 12 | Burkina Faso | 13250928 | 3.198512 |
| 13 | Mozambique | 10262271 | 2.477110 |
| 14 | United Republic of Tanzania | 8954000 | 2.161319 |
| 15 | Chad | 8822162 | 2.129496 |
| 16 | Mauritania | 8205374 | 1.980615 |
| 17 | Madagascar | 5665290 | 1.367489 |
| 18 | Angola | 4922216 | 1.188126 |
| 19 | Zambia | 4210000 | 1.016211 |
| 20 | Lesotho | 3608515 | 0.871024 |
| 21 | Uganda | 3542000 | 0.854969 |
| 22 | Burundi | 2412500 | 0.582330 |
| 23 | Cameroon | 2401127 | 0.579585 |
| 24 | Namibia | 2261000 | 0.545761 |
| 25 | Central African Republic | 2221692 | 0.536273 |
| 26 | Eswatini | 2104000 | 0.507864 |
| 27 | Senegal | 2093702 | 0.505378 |
| 28 | Eritrea | 1700000 | 0.410346 |
| 29 | Djibouti | 1025176 | 0.247457 |
| 30 | Rwanda | 1000000 | 0.241380 |
| 31 | Gambia | 491100 | 0.118542 |
| 32 | Cabo Verde | 146093 | 0.035264 |
| 33 | Guinea-Bissau | 132000 | 0.031862 |
| 34 | Botswana | 38000 | 0.009172 |
Let's visualize this as a bar chart:
count_by_country_top_15 = count_by_country.nlargest(columns='Percentage of Total Affected', n=15)
fig_count_by_country_top_15 = px.bar(data_frame = count_by_country_top_15,
x='Percentage of Total Affected',
y='Country',
text= count_by_country_top_15["Percentage of Total Affected"],
template = 'ggplot2',
)
fig_count_by_country_top_15.show()
OBSERVATION: From the above analysis, the following countries were the most hit in terms of the number of persons affected by drought:
Interestingly, top 3 of the top 5 countries most Affected were still eastern African countries. Why exactly is Eastern Africa most affected by drought? Could it be a geographical factor or as a result of poor emergency response on the part of the government?
This question calls for further investigations as our data cannot provide an answer to it!
Now, let's visualize this as a map and see how the impact of drought is distributed across the continent. We shall start with a basic, empty map.
The map below only shows African countries represented in our data. Maps in grey imply that we do not have data for those countries.
basic_map = px.choropleth(data_frame=df, locations="ISO",
locationmode="ISO-3",
scope='africa',
color='UN Sub Region',
)
basic_map.show()
Now that we know how to plot a map, let's feed the map with actual data to see where each country belongs in terms of the number of persons affected by drought.
map_df = df.groupby('Country').sum().reset_index() #creates a table consisting of countries and their corresponding Total of Affected persons
map_df
C:\Users\DELL\AppData\Local\Temp\ipykernel_8556\2456005708.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
| Country | Total Affected | |
|---|---|---|
| 0 | Angola | 4922216 |
| 1 | Botswana | 38000 |
| 2 | Burkina Faso | 13250928 |
| 3 | Burundi | 2412500 |
| 4 | Cabo Verde | 146093 |
| 5 | Cameroon | 2401127 |
| 6 | Central African Republic | 2221692 |
| 7 | Chad | 8822162 |
| 8 | Democratic Republic of the Congo | 25972806 |
| 9 | Djibouti | 1025176 |
| 10 | Eritrea | 1700000 |
| 11 | Eswatini | 2104000 |
| 12 | Ethiopia | 74705679 |
| 13 | Gambia | 491100 |
| 14 | Guinea-Bissau | 132000 |
| 15 | Kenya | 29750000 |
| 16 | Lesotho | 3608515 |
| 17 | Madagascar | 5665290 |
| 18 | Malawi | 19727628 |
| 19 | Mali | 13660753 |
| 20 | Mauritania | 8205374 |
| 21 | Mozambique | 10262271 |
| 22 | Namibia | 2261000 |
| 23 | Niger | 29303986 |
| 24 | Nigeria | 19110398 |
| 25 | Rwanda | 1000000 |
| 26 | Senegal | 2093702 |
| 27 | Somalia | 27535624 |
| 28 | South Africa | 30450000 |
| 29 | South Sudan | 15623670 |
| 30 | Sudan | 17839300 |
| 31 | Uganda | 3542000 |
| 32 | United Republic of Tanzania | 8954000 |
| 33 | Zambia | 4210000 |
| 34 | Zimbabwe | 21135118 |
Next, let's create a temporary dataframe containing some columns of our main dataframe so that we can join them to the aggregated dataframe for the map
temp_df = df[['Country', 'UN Sub Region', 'Income group', 'ISO']]
temp_df = temp_df.drop_duplicates()
temp_df
| Country | UN Sub Region | Income group | ISO | |
|---|---|---|---|---|
| 0 | Djibouti | Eastern Africa | Middle Income | DJI |
| 1 | Sudan | Northern Africa | Middle Income | SDN |
| 2 | Somalia | Eastern Africa | Low Income | SOM |
| 3 | Madagascar | Eastern Africa | Low Income | MDG |
| 4 | Burkina Faso | Western Africa | Low Income | BFA |
| 5 | Mali | Western Africa | Low Income | MLI |
| 6 | Niger | Western Africa | Low Income | NER |
| 7 | Chad | Middle Africa | Low Income | TCD |
| 8 | Mozambique | Eastern Africa | Low Income | MOZ |
| 9 | Cameroon | Middle Africa | Middle Income | CMR |
| 11 | Eswatini | Southern Africa | Middle Income | SWZ |
| 12 | Zimbabwe | Eastern Africa | Low Income | ZWE |
| 13 | Angola | Middle Africa | Middle Income | AGO |
| 14 | Mauritania | Western Africa | Middle Income | MRT |
| 15 | Namibia | Southern Africa | Middle Income | NAM |
| 17 | Malawi | Eastern Africa | Low Income | MWI |
| 18 | Guinea-Bissau | Western Africa | Low Income | GNB |
| 19 | Lesotho | Southern Africa | Middle Income | LSO |
| 20 | Cabo Verde | Western Africa | Middle Income | CPV |
| 21 | Senegal | Western Africa | Low Income | SEN |
| 22 | Gambia | Western Africa | Low Income | GMB |
| 24 | Uganda | Eastern Africa | Low Income | UGA |
| 26 | Burundi | Eastern Africa | Low Income | BDI |
| 27 | Rwanda | Eastern Africa | Low Income | RWA |
| 28 | United Republic of Tanzania | Eastern Africa | Low Income | TZA |
| 29 | Ethiopia | Eastern Africa | Low Income | ETH |
| 30 | South Africa | Southern Africa | Middle Income | ZAF |
| 32 | Kenya | Eastern Africa | Middle Income | KEN |
| 42 | Zambia | Eastern Africa | Middle Income | ZMB |
| 57 | Eritrea | Eastern Africa | Low Income | ERI |
| 67 | South Sudan | Eastern Africa | Low Income | SSD |
| 105 | Botswana | Southern Africa | Middle Income | BWA |
| 162 | Central African Republic | Middle Africa | Low Income | CAF |
| 163 | Democratic Republic of the Congo | Middle Africa | Low Income | COD |
| 166 | Nigeria | Western Africa | Middle Income | NGA |
Now, we can join our temp_df with the map_df dataframe using the "Country" common column:
map_df = pd.merge(left=map_df, right=temp_df, on="Country", how='left')
map_df
| Country | Total Affected | UN Sub Region | Income group | ISO | |
|---|---|---|---|---|---|
| 0 | Angola | 4922216 | Middle Africa | Middle Income | AGO |
| 1 | Botswana | 38000 | Southern Africa | Middle Income | BWA |
| 2 | Burkina Faso | 13250928 | Western Africa | Low Income | BFA |
| 3 | Burundi | 2412500 | Eastern Africa | Low Income | BDI |
| 4 | Cabo Verde | 146093 | Western Africa | Middle Income | CPV |
| 5 | Cameroon | 2401127 | Middle Africa | Middle Income | CMR |
| 6 | Central African Republic | 2221692 | Middle Africa | Low Income | CAF |
| 7 | Chad | 8822162 | Middle Africa | Low Income | TCD |
| 8 | Democratic Republic of the Congo | 25972806 | Middle Africa | Low Income | COD |
| 9 | Djibouti | 1025176 | Eastern Africa | Middle Income | DJI |
| 10 | Eritrea | 1700000 | Eastern Africa | Low Income | ERI |
| 11 | Eswatini | 2104000 | Southern Africa | Middle Income | SWZ |
| 12 | Ethiopia | 74705679 | Eastern Africa | Low Income | ETH |
| 13 | Gambia | 491100 | Western Africa | Low Income | GMB |
| 14 | Guinea-Bissau | 132000 | Western Africa | Low Income | GNB |
| 15 | Kenya | 29750000 | Eastern Africa | Middle Income | KEN |
| 16 | Lesotho | 3608515 | Southern Africa | Middle Income | LSO |
| 17 | Madagascar | 5665290 | Eastern Africa | Low Income | MDG |
| 18 | Malawi | 19727628 | Eastern Africa | Low Income | MWI |
| 19 | Mali | 13660753 | Western Africa | Low Income | MLI |
| 20 | Mauritania | 8205374 | Western Africa | Middle Income | MRT |
| 21 | Mozambique | 10262271 | Eastern Africa | Low Income | MOZ |
| 22 | Namibia | 2261000 | Southern Africa | Middle Income | NAM |
| 23 | Niger | 29303986 | Western Africa | Low Income | NER |
| 24 | Nigeria | 19110398 | Western Africa | Middle Income | NGA |
| 25 | Rwanda | 1000000 | Eastern Africa | Low Income | RWA |
| 26 | Senegal | 2093702 | Western Africa | Low Income | SEN |
| 27 | Somalia | 27535624 | Eastern Africa | Low Income | SOM |
| 28 | South Africa | 30450000 | Southern Africa | Middle Income | ZAF |
| 29 | South Sudan | 15623670 | Eastern Africa | Low Income | SSD |
| 30 | Sudan | 17839300 | Northern Africa | Middle Income | SDN |
| 31 | Uganda | 3542000 | Eastern Africa | Low Income | UGA |
| 32 | United Republic of Tanzania | 8954000 | Eastern Africa | Low Income | TZA |
| 33 | Zambia | 4210000 | Eastern Africa | Middle Income | ZMB |
| 34 | Zimbabwe | 21135118 | Eastern Africa | Low Income | ZWE |
Now, we have the aggregated table containing other values that we can use for our map. Next, let's group the "Total Affected" column into intervals reflecting the severity of occurence:
map_df.describe()
| Total Affected | |
|---|---|
| count | 3.500000e+01 |
| mean | 1.183669e+07 |
| std | 1.472545e+07 |
| min | 3.800000e+04 |
| 25% | 2.162846e+06 |
| 50% | 5.665290e+06 |
| 75% | 1.847485e+07 |
| max | 7.470568e+07 |
# The box plot below shows us the distribution of the data.
# This will help us identify the outliers and adequately group the data into intervals
px.box(data_frame=map_df, x='Total Affected')
Split the "Total Affected" column into intervals
bin_edges = [0, 5000000, 15000000, 30000000, map_df["Total Affected"].max()]
bin_labels = ["Low Severity (0-5M)", "Moderate Severity (>5M - 15M)", "High Severity (>15M - 30M)", "Critical Severity (>30M)"]
map_df["Severity Level"] = pd.cut(x=map_df["Total Affected"], bins=bin_edges, labels=bin_labels, right=True)
map_df
| Country | Total Affected | UN Sub Region | Income group | ISO | Severity Level | |
|---|---|---|---|---|---|---|
| 0 | Angola | 4922216 | Middle Africa | Middle Income | AGO | Low Severity (0-5M) |
| 1 | Botswana | 38000 | Southern Africa | Middle Income | BWA | Low Severity (0-5M) |
| 2 | Burkina Faso | 13250928 | Western Africa | Low Income | BFA | Moderate Severity (>5M - 15M) |
| 3 | Burundi | 2412500 | Eastern Africa | Low Income | BDI | Low Severity (0-5M) |
| 4 | Cabo Verde | 146093 | Western Africa | Middle Income | CPV | Low Severity (0-5M) |
| 5 | Cameroon | 2401127 | Middle Africa | Middle Income | CMR | Low Severity (0-5M) |
| 6 | Central African Republic | 2221692 | Middle Africa | Low Income | CAF | Low Severity (0-5M) |
| 7 | Chad | 8822162 | Middle Africa | Low Income | TCD | Moderate Severity (>5M - 15M) |
| 8 | Democratic Republic of the Congo | 25972806 | Middle Africa | Low Income | COD | High Severity (>15M - 30M) |
| 9 | Djibouti | 1025176 | Eastern Africa | Middle Income | DJI | Low Severity (0-5M) |
| 10 | Eritrea | 1700000 | Eastern Africa | Low Income | ERI | Low Severity (0-5M) |
| 11 | Eswatini | 2104000 | Southern Africa | Middle Income | SWZ | Low Severity (0-5M) |
| 12 | Ethiopia | 74705679 | Eastern Africa | Low Income | ETH | Critical Severity (>30M) |
| 13 | Gambia | 491100 | Western Africa | Low Income | GMB | Low Severity (0-5M) |
| 14 | Guinea-Bissau | 132000 | Western Africa | Low Income | GNB | Low Severity (0-5M) |
| 15 | Kenya | 29750000 | Eastern Africa | Middle Income | KEN | High Severity (>15M - 30M) |
| 16 | Lesotho | 3608515 | Southern Africa | Middle Income | LSO | Low Severity (0-5M) |
| 17 | Madagascar | 5665290 | Eastern Africa | Low Income | MDG | Moderate Severity (>5M - 15M) |
| 18 | Malawi | 19727628 | Eastern Africa | Low Income | MWI | High Severity (>15M - 30M) |
| 19 | Mali | 13660753 | Western Africa | Low Income | MLI | Moderate Severity (>5M - 15M) |
| 20 | Mauritania | 8205374 | Western Africa | Middle Income | MRT | Moderate Severity (>5M - 15M) |
| 21 | Mozambique | 10262271 | Eastern Africa | Low Income | MOZ | Moderate Severity (>5M - 15M) |
| 22 | Namibia | 2261000 | Southern Africa | Middle Income | NAM | Low Severity (0-5M) |
| 23 | Niger | 29303986 | Western Africa | Low Income | NER | High Severity (>15M - 30M) |
| 24 | Nigeria | 19110398 | Western Africa | Middle Income | NGA | High Severity (>15M - 30M) |
| 25 | Rwanda | 1000000 | Eastern Africa | Low Income | RWA | Low Severity (0-5M) |
| 26 | Senegal | 2093702 | Western Africa | Low Income | SEN | Low Severity (0-5M) |
| 27 | Somalia | 27535624 | Eastern Africa | Low Income | SOM | High Severity (>15M - 30M) |
| 28 | South Africa | 30450000 | Southern Africa | Middle Income | ZAF | Critical Severity (>30M) |
| 29 | South Sudan | 15623670 | Eastern Africa | Low Income | SSD | High Severity (>15M - 30M) |
| 30 | Sudan | 17839300 | Northern Africa | Middle Income | SDN | High Severity (>15M - 30M) |
| 31 | Uganda | 3542000 | Eastern Africa | Low Income | UGA | Low Severity (0-5M) |
| 32 | United Republic of Tanzania | 8954000 | Eastern Africa | Low Income | TZA | Moderate Severity (>5M - 15M) |
| 33 | Zambia | 4210000 | Eastern Africa | Middle Income | ZMB | Low Severity (0-5M) |
| 34 | Zimbabwe | 21135118 | Eastern Africa | Low Income | ZWE | High Severity (>15M - 30M) |
Finally, we plot the map with an actual data.
color_discrete_map = {
"Critical Severity (>30M)": "#A70100",
"High Severity (>15M - 30M)": "#D93F00",
"Moderate Severity (>5M - 15M)": "#FD8E2A",
"Low Severity (0-5M)": "#FFD983"
}
map_plot = px.choropleth(data_frame=map_df, locations="ISO", locationmode="ISO-3", scope='africa',
color='Severity Level', color_discrete_map=color_discrete_map,
hover_data=map_df[["Severity Level", "Country", "Total Affected", "Income group", "UN Sub Region"]],
height=600, width=1000,
labels="Country")
# update layout
map_plot.update_layout(title="Drought Severity Level by Country",
margin={"r":0, "t":40, "l":0, "b":0},
legend=dict(x=0, y=0))
map_plot.show()
map_plot.show()
Distribution of persons affected by drought by income group of the country
count_by_income = df.groupby("Income group").sum().reset_index()
count_by_income
C:\Users\User\AppData\Local\Temp\ipykernel_10208\1507261854.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
| Income group | Total Affected | |
|---|---|---|
| 0 | Low Income | 288212909 |
| 1 | Middle Income | 126071199 |
Let's compute the percentage for each Income group:
count_by_income['Percentage'] = (count_by_income['Total Affected'] / sum(count_by_income['Total Affected'])) * 100
count_by_income
| Income group | Total Affected | Percentage | |
|---|---|---|---|
| 0 | Low Income | 288212909 | 69.568903 |
| 1 | Middle Income | 126071199 | 30.431097 |
We can as well visualize this:
px.bar(data_frame = count_by_income, x = "Income group", y = "Total Affected", text="Percentage")
OBSERVATION: The bar chart shows that countries with low income were affected more than those in middle income by more than two times! It therefore gives a pointer that the income level of a country most definitely affects the number of persons involved in drought
# Continue to explore the data to address your additional research
# questions. Add more headers as needed if you have more questions to
# investigate.